Assembly Language
© Copyright Brian Brown, 1988-2000. All rights reserved.
| Notes | Home Page |


16-32 BIT MICROPROCESSORS
home page prev page next page

This module is the individual work of Brian Brown. It may not be copied or used in any form without his permission.


OBJECTIVE
The study of advanced micro-processor architectures will aid the student in their understanding of complex systems and enable effecient software production.


The Intel 80x86 Processor Family
The Intel 80x86 family looks like

	Processor     Address Bus Size  Data Bus Size  Initial Clock Rate               
	8088          20 (1Mb)          8              4.77Mhz            
	8086          20                16             4.77Mhz            
	80286         24 (16Mb)         16             8-12Mhz            
	80386         32 (4Gb)          16             16-33Mhz           
	80486         32 (4Gb)          32             33-66Mhz           


iAPX386
The programming model is increased over the iAPX88 to include,

The on-chip instruction queue is 16 bytes long. As the average instruction is 3.2 bytes long, up to 5 instructions will be pre-fetched. The iAPX386 has a 3 stage internal execution pipeline for decoding and executing instructions.

Long integers are now supported (32 and 64 bits). Also added was

Virtual memory support is enhanced by the use of instruction continuation. This allows an instruction to be restarted at a later date, in the event of the instruction not being able to continue.

The processor runs in several states, determined by the IOPL level in the status register. Extra protection is provided by use of descriptor tables on a task by task basis, which describe their priviledge level, and access rights (execute, read only, read write). Each task can also have an I/O bit permission map associated with it, used to determine which ports it can access. In addition to this, the paging unit assigns privileges and access rights to each page.

The use of an internal memory management unit imposes no delay in performing the address translation, thus the system runs with a four stage bus cycle.

With 32 bit data/address busses, the iAPX386 addresses 4 gigabytes of physical memory. At a clock rate of 30mHz, the cycle time is 33.3ns. The processor supports address pipelining. This makes available the address of the next bus cycle on the address bus before the end of the current cycle (one clock cycle earlier). The advantage is that it gives a greater address hold time to memory, allowing zero wait state running where one wait state might've been necessary.


The 80386 programming model
The iAPX386 processor is housed in a 132 pin grid array package, and manufactured using the CHMOS III process. It operates at speeds up to 33mhz with instructions rates up to 8 mips. The programming model implements paging and multi-tasking.

programming model

CPU Registers

	General Purpose
		AX	general data, results of operations, multiply/divide
		BX	general data, reference variables on stack
		CX	general data, loop count instructions, rotate instructions
		DX	general data, multiply/divide instructions

	Segment Registers
		CS	code
		DS	data
		ES	extra data
		FS/GS	
		SS	stack

	String Registers
		SI	source index
		DI	destination index
		
	Base Pointer Registers
		SP	stack pointer
		BP	general purpose base pointer
		IP	instruction pointer

flags register

CARRY FLAG - Set by arithmetic instructions which generate either a carry or borrow
PARITY FLAG - Set by most instructions if the least significant bits of the destination operand contain an even number of 1's.
AUXILARY FLAG Set if there is a carry or borrow involving bit 4 of EAX
ZERO FLAG Set by most instructions if the result is binary zero
SIGN FLAG Most operations set this bit the same as the most significant bit of the result
TRACE FLAG Permits single stepping of programs. After executing a single instruction, the processor generates an internal exception 1.
INTERRUPT FLAG when set, the processor recognises external interrupts on the INTR pin.
DIRECTION FLAG Set and cleared using the STD and CLD instructions. It is used in string processing
OVERFLOW FLAG Most arithmetic instructions set this bit, indicating that the result was too large to fit in the destination
INPUT/OUTPUT PRIVILEDGE LEVEL FLAGS Used to protected mode to generate four levels of security
NESTED TASK FLAG Used in protected mode, when set, it indicates that one system task has invoked another via a CALL instruction, rather than a JMP.
RESUME FLAG Used by the debug registers DR6 and DR7. It enables you to turn off certain exceptions whilst debugging code.
VIRTUAL 8086 MODE FLAG Permits 80386 to behave like a high-speed 8086.

The nested task flag is set to indicate that this task is nested inside another task. This means that the task state segment has a back link to the previous task.

The three bit IOPL determines the privledge level of the currently running task. The processor can use this to enforce security and protection amongst processes and peripheral devices.


PROTECTED MODE REGISTERS
The descriptor registers are used to invoke privledges for tasks and construct pointers to system tables which hold information about stacks, tasks, exception vectors etc.

protected mode registers

GLOBAL DESCRIPTOR TABLE REGISTER
Points to a general purpose table of segment descriptors, and can be used by all programs to reference segments of memory. It is mandatory for protected mode operation.

INTERRUPT DESCRIPTOR TABLE REGISTER
Points to a table of segment descriptors that define interrupt or exception handling routines. This replaces the interrupt vector table of the iAPX86.

LOCAL DESCRIPTOR TABLE REGISTER
Holds the address of a per-task table of segment descriptors. Each task can be assigned its own LDT, or share other tasks LDT's to support multi-tasking features (shared data, code).

TASK REGISTER
Identifies the currently executing task.


CONTROL, DEBUG, TEST REGISTERS
The six debug registers provide on-chip support for debugging. Up to four linear breakpoints can be specified, register DR7 is used to control breakpoints, whilst DR6 holds the current state of the breakpoints.

control and debug registers

The test registers control the operation of the on-chip translation lookaside buffer. In paged mode, the iAPX386 caches the base address of the most recently used pages in an internal buffer. This speeds up memory accesses, as the processor does not need to access the page tables in memory to find out where pages are located in system memory (except for those not cached).

When the processor accesses a page whose base address is not in the TLB, the base address of the page is obtained from the page tables in system memory, and at the same time the TLB entries are updated. Whenever the CR3 register value is altered, the TLB should also be flushed to ensure it reflects the current page table values.


EXCEPTIONS
Hardware Interrupts

Software Exceptions

exceptions


EXCEPTION	TYPE						ErrorCode
0		÷0				Fault
1		Debug				Fault/Trap
2 		NMI				Trap
3		BreakPoint			Trap
4		Overflow			Trap
5		Bound				Fault
6		Invalid Opcode			Fault
7		Device not available		Fault
8		Double Fault			Abort		Yes
9		Coprocessor device		Abort
10		Invalid TSS			Fault		Yes
11		Segment not present		Fault		Yes
12		Stack 				Fault		Yes
13		General Protection		Fault		Yes
14		Page				Fault		Yes
15		Reserved			Trap
16		Coprocessor error		Fault
17-31		Reserved
32-255		Maskable Interrupts and 'INT n' vectors


MEMORY MANAGEMENT
is a general term which includes all the various techniques by which an address generated by the CPU is translated into the actual address of the data in memory.

memory management

The address generated by the processor is known as a logical address, as is associated with the address seen by the programmer( or referenced by the program).

The actual memory supplied in a computer system is known as physical memory, and accepts a physical address.

The purpose of address translation is to change the logical address generated by the processor into the physical address for the system memory.

The iAPX386 can access 4Gbytes of logical memory. Using address translation techniques, a program could think it was executing within the top 64k block of this address space, but in reality be executing within the lowest 64k block.

The principle of address translation is important for multi-user systems, which need to relocate the user tasks anywhere in available memory. It isolates users from depending upon certain memory addresses and requirements.

Address translation is done via a lookup table. The address coming in from the processor serves as an entry into the table, the contents of the table entry then specifies the physical address. If the entire address bus is used, the table would require 2^32 entries. To reduce the size of the table, a portion of the address bus (a group of high order address lines) is used as an entry into the table. The table entry value is then combined with the rest of the address bus and presented to physical memory. The high order bits are chosen as a table entry value because they change less often whilst a program is running, thus are a good candidate for caching.

address translation table

Note how the table entry specifies a base address, whilst the low order bits of the virtual address specify an offset value from that base address.

The limits of a base address plus the range of an offset address value is called a segment or page (base+0 to base+max_offset_value).

The number of virtual address bits used to form an entry into the table determines the segment/page size (10bits = 1024bytes segment).

In the majority of systems, the logical and physical addresses are different.

In simple 8bit micro-processor systems, the logical address may have a 1:1 correspondence with the physical address (6802, Z80, 8088).


SEGMENTATION
Segmentation is a technique which involves having all the programs code and data resident in RAM at run-time. For a given system, this limits the number of programs that can be run simultaneously. Segment sizes can differ from program to program, which means the operating system must employ considerable time dedicated to managing the memory system.

The most common problem associated with segmented memory is fragmentation. This occurs when running programs release their segmented space, but this space is spread out over the entire address range. Thus, there could be 1mb of free RAM, but is consists of many small blocks scattered over a 4mb address range. A program requiring 500k to run could not be loaded, as segmentation requires the memory to be contigous (one large block). In this instance, the program could not run even though there is sufficient memory.

To overcome this defiency, the operating system employs a technique called compaction, which involves relocating existing segments so as to combine all the small free blocks into larger blocks, enabling waiting programs to be run.


REAL MODE AND PROTECTED MODE
The 80386 powers up in real mode. This means it acts and calculates memory addresses just like a 8086 processor. Memory access is restricted to the lower 1Mb after an initial jump instruction.

To gain access to all available memory, and enable all the sophisticated features of the 386, it must run in protected mode. This gives access to 4Gb of memory, virtual memory, separation of user tasks, protection between tasks, privileged instructions, newer 32 bit instructions and a wide range of other features.

The 386 has special registers provided to enable protected mode, and these (along with system tables which define each task to be run), must be set up prior to switching the processor into protected mode.


HOW SEGMENTATION WORKS IN REAL MODE
An 8086 real mode program is split into a minimum of THREE segments. A segment is a block of memory referenced by a segment register within the processor.

A segment has a maximum size of 64K. The THREE segments are named DATA, CODE and STACK, and are referenced by the DS, CS and SS registers respectively. Segments may also overlap in memory, and be of different sizes.

The STACK segment handles instructions like CALL, PUSH, POP and interrupt processing.

The CODE segment is used to store instructions (programs).

The DATA segment is used to hold variables and shared data.

Instructions work with one of the defined segments. For example, the instruction


	MOV	AX, 200

moves the constant value 200 from the code segment into the AX register.

Which segment does the following instruction reference?


	MOV	AX, [200]

Which segment does the following instruction reference?


	PUSH	AX

Which segment does the following instruction reference?


	MOV	AX, ES:[200]


GENERATING THE PHYSICAL ADDRESS
Lets now look at how the actual address is generated. As explained, all references to memory are relative to a segment register.

The 8086 uses a 20 bit address bus to generate an address in the range 00000-FFFFF. Yet, all registers in the 8086 are 16 bit registers. The question is,


	HOW is a 20 bit address generated from 16 bit registers????

ANSWER: All memory references consist of using TWO REGISTERS!

A segment register defines the base address, and another register is used to specify an offset. For example, the instruction


	MOV  AX, [02]

means move the value stored in memory location 02 (relative to DS) into the AX register. Lets assume that the register values and memory contents look as follows,


	1.	LEFT SHIFT THE SEGMENT REGISTER BY FOUR BITS BY ADDING 
		ANOTHER 0

			DS = 0010		DS = 00100

	2.	ADD THE OFFSET

			00100 + 02  = 00102

	Thus the actual memory location referenced is 00102, thus AX ends up with 
	the value 34.

	CODE INSTRUCTIONS ARE REFERENCED USING CS:IP
	STACK IS REFERENCED USING SS:SP or SS:BX


PAGING
A logical address can be split into page numbers and offsets within a page. Paging breaks memory into a number of fixed size blocks (normally about 4k). A running program would normally have 2 to 3 pages of its program in memory, as it has a small locality of reference. This means that programs normally spend most of their time in a small portion of their space (like waiting for keyboard entry).

Such a technique will allow more programs to be stored in memory for a given RAM size than segmentation. The remaining pages belonging to a program are stored on disk, and normally loaded when referenced by the running program.

When a running program accesses a memory location outside of its current pages, a page fault occurs. This causes a processor exception, so the operating system then

This is called demand paging, as a new page is brought into memory when a page fault occurs. Each page has a modified bit associated with it, and the operating system uses this bit to determine whether it should be written back to disk when being swapped out.

Each page also has several bits which specify its age, and when determining which page to swap out, the LEAST RECENTLY USED page is chosen. This is called the LRU algorithm.

Paging systems also employ other paging algorithms, another being anticipatory paging. This involves trying to anticipate which pages might be needed by a running program in the near future, and pre-loading them into RAM in order to reduce the overhead incurred by page faults.


iAPX386 Segmentation
When running in protected mode,the various segment registers are internally cached into segment descriptors. These descriptors are NOT programmer visible, but their contents are automatically loaded by the processor from contents of the segment registers and the descriptor tables pointed to by GDTR and LDTR. All memory accesses use these internal segment descriptors.

386 descriptors

The internal segment descriptors are used to provide fast checking and address calculations. If the descriptors were not used, every reference would need to access the decriptor entries stored in the GDT/LDT tables pointed to by GDTR and LDTR. Note that these internal cache descriptor registers are updated when the segment register contents are altered.

The following table illustrates address translation in protected mode. The segment register value points to an entry in an operating system table, which holds the base address and access rights related to that task. A base address derived from the table is combined with the offset portion to form the logical address.

address translation tables

There are THREE operating system tables used in the segmentation scheme.

Each table holds up to 8192 entries. The address of each table is held in a processor register, GDTR, IDTR and LDTR respectively. The instructions LGDT, LIDT and LLDT are used to load the address of the tables into these registers. A segment cannot be accessed by a task unless there is a corresponding entry in the GDT/LDT. Each entry in the table is eight bytes long, and is called a segment descriptor.

This means that valid segment register values are (8, 10h, 18h, 20h etc). Note that entry value 00h is not used. A program loads the segment registers with the correct entry value to point to its associated segment descriptor.


Segment Registers in Protected Mode
386 segment registers

How do we find out the current priviledge level ????


	mov ax, cs
	and ax, 03h

Calculation of the segments physical address

calculating segments address

The global descriptor table contains descriptors which are normally available to all tasks in the system. Generally, these are tasks used by the operating system.

The local descriptor table contains descriptors associated with a given task. The operating system assigns each task a separate LDT. The table provides a mechanism for isolating a given tasks code and data segments from the rest of the operating system or other tasks.

A segment cannot be accessed by a task if its segment descriptor does not exist in either a LDT or GDT.

The basic format of a segment descriptor is,

segment descriptor format

If the descriptor is a code or data descriptor, it looks like,

segment descriptor format


home page prev page next page
Copyright Brian Brown, 1991-2000, All rights reserved.